Reinforcement values of visual patterns compared through concurrent performances1
نویسندگان
چکیده
منابع مشابه
Evaluating Concurrent Reinforcement Learners
Assumptions underlying the convergence proofs of reinforcement learning (RL) algorithms like Q-learning are violated when multiple interacting agents adapt their strategies on-line as a result of learning. Empirical investigations in several domains, however, have produced encouraging results. We evaluate the convergence behavior of concurrent reinforcement learning agents using game matrices a...
متن کاملFast Concurrent Reinforcement Learners
When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes non-stationary. This makes learning in multiagent systems more difficult than single-agent learning. A few methods, however, are known to guarantee convergence to equilibrium in the limit in such systems. In this pap...
متن کاملConcurrent Hierarchical Reinforcement Learning
We describe a language for partially specifying policies in domains consisting of multiple subagents working together to maximize a common reward function. The language extends ALisp with constructs for concurrency and dynamic assignment of subagents to tasks. During learning, the subagents learn a distributed representation of the Q-function for this partial policy. They then coordinate at run...
متن کاملInteractive Selection of Visual Features through Reinforcement Learning
We introduce a new class of Reinforcement Learning algorithms designed to operate in perceptual spaces containing images. They work by classifying the percepts using a computer vision algorithm specialized in image recognition, hence reducing the visual percepts to a symbolic class. This approach has the advantage of overcoming to some extent the curse of dimensionality by focusing the attentio...
متن کاملOn-policy concurrent reinforcement learning
When an agent learns in a multiagent environment, the payoff it receives is dependent on the behavior of the other agents. If the other agents are also learning, its reward distribution becomes non-stationary. This makes learning in multiagent systems more difficult than singleagent learning. Prior attempts at value-function based learning in such domains have used offpolicy Q-learning that do ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the Experimental Analysis of Behavior
سال: 1972
ISSN: 0022-5002
DOI: 10.1901/jeab.1972.18-281